16 research outputs found

    Declutter and Resample: Towards parameter free denoising

    Get PDF
    In many data analysis applications the following scenario is commonplace: we are given a point set that is supposed to sample a hidden ground truth KK in a metric space, but it got corrupted with noise so that some of the data points lie far away from KK creating outliers also termed as {\em ambient noise}. One of the main goals of denoising algorithms is to eliminate such noise so that the curated data lie within a bounded Hausdorff distance of KK. Popular denoising approaches such as deconvolution and thresholding often require the user to set several parameters and/or to choose an appropriate noise model while guaranteeing only asymptotic convergence. Our goal is to lighten this burden as much as possible while ensuring theoretical guarantees in all cases. Specifically, first, we propose a simple denoising algorithm that requires only a single parameter but provides a theoretical guarantee on the quality of the output on general input points. We argue that this single parameter cannot be avoided. We next present a simple algorithm that avoids even this parameter by paying for it with a slight strengthening of the sampling condition on the input points which is not unrealistic. We also provide some preliminary empirical evidence that our algorithms are effective in practice

    Topological analysis of scalar fields with outliers

    Get PDF
    Given a real-valued function ff defined over a manifold MM embedded in Rd\mathbb{R}^d, we are interested in recovering structural information about ff from the sole information of its values on a finite sample PP. Existing methods provide approximation to the persistence diagram of ff when geometric noise and functional noise are bounded. However, they fail in the presence of aberrant values, also called outliers, both in theory and practice. We propose a new algorithm that deals with outliers. We handle aberrant functional values with a method inspired from the k-nearest neighbors regression and the local median filtering, while the geometric outliers are handled using the distance to a measure. Combined with topological results on nested filtrations, our algorithm performs robust topological analysis of scalar fields in a wider range of noise models than handled by current methods. We provide theoretical guarantees and experimental results on the quality of our approximation of the sampled scalar field

    TEKNIK, METODE DAN IDEOLOGI PENERJEMAHAN BUKU ECONOMIC CONCEPTS OF IBN TAIMIYAH KE DALAM BAHASA INDONESIA DAN DAMPAKNYA PADA KUALITAS TERJEMAHAN

    Get PDF
    Sakut Anshori. S130907011. 2010. Teknik, Metode dan Ideologi Penerjemahan Buku Economic Concepts of Ibn Taimiyahke dalam Bahasa Indonesia dan Dampaknya Pada Kualitas Terjemahan. Tesis. Pascasarjana Program Magister Linguistik, Minat Utama Penerjemahan. Universitas Sebelas Maret Surakarta. Penelitian ini bertujuan untuk mengidentifikasi dan mendeskripsikan teknik, metode, dan ideologi penerjemahan, serta melihat dampaknya terhadap kualitas terjemahan dari aspek keakuratan (accuracy), keberterimaan (acceptability) serta keterbacaan (readability) terjemahan. Penelitian ini merupakan penelitian deskriptif, kualitatif terpancang untuk kasus tunggal. Penelitian ini terdiri dari 2 jenis sumber data. Sumber data pertama adalah dokumen yang berupa buku sumber dan produk terjemahannya sebagai sumber data objektif. Sumber data kedua, diperoleh dari informan yang memberi informasi mengenai keakuratan,keberterimaan dan keterbacaan hasil terjemahan sebagai data afektif. Pengumpulan data dilakukan melalui identifikasi teknik dengan pengkajian dokumen, penyebaran kuesioner dan wawancara mendalam. Pemilihan sampel data dilakukan dengan teknik purposif sampling. Hasil penelitian menunjukan bahwa terdapat 14 jenis teknik penerjemahan dari 593 teknik yang digunakan penerjemah dalam 165 data. Berdasarkan frekuensi penggunaan teknik tersebut adalah: penerjemahan harfiah 187(31,53%), peminjaman murni 132 (22,26%), padanan lazim 78 (13,15%), modulasi 44(7, 42 %), amplifikasi 30 (5 ,06 %), penambahan 30 (5,06%), peminjaman alamiah 24(4, 05%), kalke 21 (3, 54%), reduksi 18 (3, 03 %), eksplisitasi 10 (1, 69 %) partikularisasi 8 (1, 35%), penghilangan 6 (1, 01), dan deskripsi 3(0, 51%). Berdasarkan teknik yang dominan muncul, buku ini cenderung menggunakan metode terjemahan harfiah dengan ideologi foreignisasi. Dampak dari penggunaan teknik penerjemahan ini terhadap kualitas terjemahan cukup baik dengan rata-rata skor keakuratan terjemahan 2, 53, keberterimaan 2, 73 dan keterbacaan 2, 91. Hal ini mengindikasikan terjemahan memiliki kualitas keakuratan, keberterimaan dan keterbacaan yang baik. Teknik yang banyak memberi kontribusi positif terhadap tingkat keakuratan, keberterimaan, dan keterbacaan terjemahan adalah teknik penerjemahan harfiah, peminjaman murni, dan padanan lazim. Sementara, teknik penerjemahan yang banyak mengurangi tingkat keakuratan dan keberterimaan adalah modulasi, penambahan, dan penghilangan. Implikasi penelitian, penerjemah perlu meningkatkan kompetensi penerjemahan dan mesti berhati-hati dalam menentukan teknik penerjemahan agar diperoleh terjemahan yang berkualitas baik. Kata Kunci:teknik penerjemahan, metode penerjemahan, ideologi penerjemahan, kualitas terjemahan, keakuratan, keberterimaan dan keterbacaan

    Sequence-based GWAS, network and pathway analyses reveal genes co-associated with milk cheese-making properties and milk composition in Montbéliarde cows

    Get PDF
    Background Milk quality in dairy cattle is routinely assessed via analysis of mid-infrared (MIR) spectra; this approach can also be used to predict the milk’s cheese-making properties (CMP) and composition. When this method of high-throughput phenotyping is combined with efficient imputations of whole-genome sequence data from cows’ genotyping data, it provides a unique and powerful framework with which to carry out genomic analyses. The goal of this study was to use this approach to identify genes and gene networks associated with milk CMP and composition in the Montbéliarde breed. Results Milk cheese yields, coagulation traits, milk pH and contents of proteins, fatty acids, minerals, citrate, and lactose were predicted from MIR spectra. Thirty-six phenotypes from primiparous Montbéliarde cows (1,442,371 test-day records from 189,817 cows) were adjusted for non-genetic effects and averaged per cow. 50 K genotypes, which were available for a subset of 19,586 cows, were imputed at the sequence level using Run6 of the 1000 Bull Genomes Project (comprising 2333 animals). The individual effects of 8.5 million variants were evaluated in a genome-wide association study (GWAS) which led to the detection of 59 QTL regions, most of which had highly significant effects on CMP and milk composition. The results of the GWAS were further subjected to an association weight matrix and the partial correlation and information theory approach and we identified a set of 736 co-associated genes. Among these, the well-known caseins, PAEP and DGAT1, together with dozens of other genes such as SLC37A1, ALPL, MGST1, SEL1L3, GPT, BRI3BP, SCD, GPAT4, FASN, and ANKH, explained from 12 to 30% of the phenotypic variance of CMP traits. We were further able to identify metabolic pathways (e.g., phosphate and phospholipid metabolism and inorganic anion transport) and key regulator genes, such as PPARA, ASXL3, and bta-mir-200c that are functionally linked to milk composition. Conclusions By using an approach that integrated GWAS with network and pathway analyses at the whole-genome sequence level, we propose candidate variants that explain a substantial proportion of the phenotypic variance of CMP traits and could thus be included in genomic evaluation models to improve milk CMP in Montbéliarde cows.info:eu-repo/semantics/publishedVersio

    Sequence-based GWAS, network and pathway analyses reveal genes co-associated with milk cheese-making properties and milk composition in Montbéliarde cows

    Get PDF
    International audienceAbstractBackgroundMilk quality in dairy cattle is routinely assessed via analysis of mid-infrared (MIR) spectra; this approach can also be used to predict the milk’s cheese-making properties (CMP) and composition. When this method of high-throughput phenotyping is combined with efficient imputations of whole-genome sequence data from cows’ genotyping data, it provides a unique and powerful framework with which to carry out genomic analyses. The goal of this study was to use this approach to identify genes and gene networks associated with milk CMP and composition in the Montbéliarde breed.ResultsMilk cheese yields, coagulation traits, milk pH and contents of proteins, fatty acids, minerals, citrate, and lactose were predicted from MIR spectra. Thirty-six phenotypes from primiparous Montbéliarde cows (1,442,371 test-day records from 189,817 cows) were adjusted for non-genetic effects and averaged per cow. 50 K genotypes, which were available for a subset of 19,586 cows, were imputed at the sequence level using Run6 of the 1000 Bull Genomes Project (comprising 2333 animals). The individual effects of 8.5 million variants were evaluated in a genome-wide association study (GWAS) which led to the detection of 59 QTL regions, most of which had highly significant effects on CMP and milk composition. The results of the GWAS were further subjected to an association weight matrix and the partial correlation and information theory approach and we identified a set of 736 co-associated genes. Among these, the well-known caseins, PAEP and DGAT1, together with dozens of other genes such as SLC37A1, ALPL, MGST1, SEL1L3, GPT, BRI3BP, SCD, GPAT4, FASN, and ANKH, explained from 12 to 30% of the phenotypic variance of CMP traits. We were further able to identify metabolic pathways (e.g., phosphate and phospholipid metabolism and inorganic anion transport) and key regulator genes, such as PPARA, ASXL3, and bta-mir-200c that are functionally linked to milk composition.ConclusionsBy using an approach that integrated GWAS with network and pathway analyses at the whole-genome sequence level, we propose candidate variants that explain a substantial proportion of the phenotypic variance of CMP traits and could thus be included in genomic evaluation models to improve milk CMP in Montbéliarde cows

    Inférence topologique à partir de mesures

    Get PDF
    Massive amounts of data are now available for study. Asking questions that are both relevant and possible to answer is a difficult task. One can look for something different than the answer to a precise question. Topological data analysis looks for structure in point cloud data, which can be informative by itself but can also provide directions for further questioning. A common challenge faced in this area is the choice of the right scale at which to process the data.One widely used tool in this domain is persistent homology. By processing the data at all scales, it does not rely on a particular choice of scale. Moreover, its stability properties provide a natural way to go from discrete data to an underlying continuous structure. Finally, it can be combined with other tools, like the distance to a measure, which allows to handle noise that are unbounded. The main caveat of this approach is its high complexity.In this thesis, we will introduce topological data analysis and persistent homology, then show how to use approximation to reduce the computational complexity. We provide an approximation scheme to the distance to a measure and a sparsifying method of weighted Vietoris-Rips complexes in order to approximate persistence diagrams with practical complexity. We detail the specific properties of these constructions.Persistent homology was previously shown to be of use for scalar field analysis. We provide a way to combine it with the distance to a measure in order to handle a wider class of noise, especially data with unbounded errors. Finally, we discuss interesting opportunities opened by these results to study data where parts are missing or erroneous.La quantité de données disponibles n'a jamais été aussi grande. Se poser les bonnes questions, c'est-à-dire des questions qui soient à la fois pertinentes et dont la réponse est accessible est difficile. L'analyse topologique de données tente de contourner le problème en ne posant pas une question trop précise mais en recherchant une structure sous-jacente aux données. Une telle structure est intéressante en soi mais elle peut également guider le questionnement de l'analyste et le diriger vers des questions pertinentes. Un des outils les plus utilisés dans ce domaine est l'homologie persistante. Analysant les données à toutes les échelles simultanément, la persistance permet d'éviter le choix d'une échelle particulière. De plus, ses propriétés de stabilité fournissent une manière naturelle pour passer de données discrètes à des objets continus. Cependant, l'homologie persistante se heurte à deux obstacles. Sa construction se heurte généralement à une trop large taille des structures de données pour le travail en grandes dimensions et sa robustesse ne s'étend pas au bruit aberrant, c'est-à-dire à la présence de points non corrélés avec la structure sous-jacente.Dans cette thèse, je pars de ces deux constatations et m'applique tout d'abord à rendre le calcul de l'homologie persistante robuste au bruit aberrant par l'utilisation de la distance à la mesure. Utilisant une approximation du calcul de l'homologie persistante pour la distance à la mesure, je fournis un algorithme complet permettant d'utiliser l'homologie persistante pour l'analyse topologique de données de petite dimension intrinsèque mais pouvant être plongées dans des espaces de grande dimension. Précédemment, l'homologie persistante a également été utilisée pour analyser des champs scalaires. Ici encore, le problème du bruit aberrant limitait son utilisation et je propose une méthode dérivée de l'utilisation de la distance à la mesure afin d'obtenir une robustesse au bruit aberrant. Cela passe par l'introduction de nouvelles conditions de bruit et l'utilisation d'un nouvel opérateur de régression. Ces deux objets font l'objet d'une étude spécifique. Le travail réalisé au cours de cette thèse permet maintenant d'utiliser l'homologie persistante dans des cas d'applications réelles en grandes dimensions, que ce soit pour l'inférence topologique ou l'analyse de champs scalaires

    Sparse Higher Order \v{C}ech Filtrations

    Full text link
    For a finite set of balls of radius rr, the kk-fold cover is the space covered by at least kk balls. Fixing the ball centers and varying the radius, we obtain a nested sequence of spaces that is called the kk-fold filtration of the centers. For k=1k=1, the construction is the union-of-balls filtration that is popular in topological data analysis. For larger kk, it yields a cleaner shape reconstruction in the presence of outliers. We contribute a sparsification algorithm to approximate the topology of the kk-fold filtration. Our method is a combination and adaptation of several techniques from the well-studied case k=1k=1, resulting in a sparsification of linear size that can be computed in expected near-linear time with respect to the number of input points. Our method also extends to the multicover bifiltration, composed of the kk-fold filtrations for several values of kk, with the same size and complexity bounds.Comment: Extended journal versio
    corecore